Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines #25300

dmatej · 2024-12-28T16:15:08Z

There were several issues, see individual commits. The main problem was that the startup can be faster than shutdown and then they could collide on ports and files. The most problematic was the debug port which is enabled since the JVM startup until the very end.

On my new machine it was reproducible in some 80% of executions.

Solution for #25292:

Instead of busy spinning on remote port we now open the connection and wait until connection is disconnected.

For #25295 was needed also

Move startup to shutdown hooks
The startup hook waits for the end of other glassfish shutdown hooks (detected by name)
Logging is explicitly stopped
For extreme cases I added additional logging for dying and starting process which can be enabled by setting an environment option export AS_RESTART_LOGFILES=true;

Note

See https://bugs.openjdk.org/browse/JDK-8284282 - applies to our Jenkins and most docker containers too. Terminated GlassFish instances become zombies, then handle#isAlive returns true and onExit.get is still blocked.

- The start succeeded too early and on fast machines collided with shutdown. - Shutdown Hook is really the last thing in the JVM capable of doing it. - All shutdown hooks have names now Signed-off-by: David Matějček <[email protected]>

…cases - when current (old) JVM had enabled debugging, the new one sometimes failed to start. It is not possible to wait from the inside. - Stop the kernell after adding the last shutdown hook; shutdown hooks run in parallel, but we have to ensure that ours will be executed after all other non-daemon hooks finish. - export AS_RESTART_LOGFILES=true to get "old" and "new" files in the server's log directory. It is trivial workaround, because the standard logging system might get into a conflict with the new GF instance too. - The "super debug" is not helpful as it affects timing Signed-off-by: David Matějček <[email protected]>

Signed-off-by: David Matějček <[email protected]>

- its only usage was for the domain restart which was reimplemented Signed-off-by: David Matějček <[email protected]>

- backup of the server.log cannot be done if the server is dead - Using System.Logger instead of JUL Signed-off-by: David Matějček <[email protected]>

- Original code caused local port exhaustion - Original code used busy spinning instead of signals Signed-off-by: David Matějček <[email protected]>

dmatej · 2024-12-29T15:59:39Z

Heuréka! And now I see why we had that weird code checking of info() ... Jenkins/k8s/docker doesn't reap zombies and then onExit().get() blocks forever.
https://bugs.openjdk.org/browse/JDK-8284282?jql=status%20in%20(Closed%2C%20Submitted)%20AND%20text%20~%20"ProcessHandle"

- Reverted usage of ProcessHandle.onExit.get as it doesn't work in containers which don't have strict reaper. The zombie project is considered as alive and get then hangs forever. - Added waitpid, however it is not installed everywhere - if it is missing, we sleep for 1 second instead. That should be enough so the operating system could do the cleanup. Signed-off-by: David Matějček <[email protected]>

nucleus/core/kernel/src/main/java/com/sun/enterprise/v3/admin/StartServerHook.java

OndroMih · 2024-12-29T22:09:07Z

I renamed this PR so that it reflects all things that were fixed. The title will appear in the release notes so it's good if it clearly describes all that this PR adds/fixes.

dmatej added 4 commits December 28, 2024 01:16

The rotationTimer should be a daemon thread

e95d599

Signed-off-by: David Matějček <[email protected]>

Deleted JavaClassRunner

d9c27a5

- its only usage was for the domain restart which was reimplemented Signed-off-by: David Matějček <[email protected]>

dmatej added the bug Something isn't working label Dec 28, 2024

dmatej added this to the 7.0.21 milestone Dec 28, 2024

dmatej requested review from avpinchuk and a team December 28, 2024 16:15

dmatej self-assigned this Dec 28, 2024

dmatej added 2 commits December 28, 2024 17:53

RestTestBase - improved confusing log on test error

77b4cf4

- backup of the server.log cannot be done if the server is dead - Using System.Logger instead of JUL Signed-off-by: David Matějček <[email protected]>

Improved effectivity of waiting for stop/start instance actions

21fe61c

- Original code caused local port exhaustion - Original code used busy spinning instead of signals Signed-off-by: David Matějček <[email protected]>

dmatej force-pushed the fix-restart-on-fast-machines branch 14 times, most recently from 24b4717 to 3c040b2 Compare December 29, 2024 15:59

dmatej linked an issue Dec 29, 2024 that may be closed by this pull request

Ephemeral ports are exhausted if stopping the DAS takes a long time #25292

Closed

dmatej force-pushed the fix-restart-on-fast-machines branch from 3c040b2 to 190fd0b Compare December 29, 2024 16:12

dmatej marked this pull request as ready for review December 29, 2024 17:42

avpinchuk reviewed Dec 29, 2024

View reviewed changes

nucleus/core/kernel/src/main/java/com/sun/enterprise/v3/admin/StartServerHook.java Show resolved Hide resolved

nucleus/core/kernel/src/main/java/com/sun/enterprise/v3/admin/StartServerHook.java Show resolved Hide resolved

OndroMih approved these changes Dec 29, 2024

View reviewed changes

OndroMih changed the title ~~Fix restart on fast machines~~ Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines Dec 29, 2024

avpinchuk approved these changes Dec 29, 2024

View reviewed changes

arjantijms merged commit 3252cc5 into eclipse-ee4j:master Dec 30, 2024
3 checks passed

dmatej deleted the fix-restart-on-fast-machines branch December 30, 2024 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines #25300

Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines #25300

dmatej commented Dec 28, 2024 •

edited

Loading

dmatej commented Dec 29, 2024

OndroMih commented Dec 29, 2024

Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines #25300

Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines #25300

Conversation

dmatej commented Dec 28, 2024 • edited Loading

dmatej commented Dec 29, 2024

OndroMih commented Dec 29, 2024

dmatej commented Dec 28, 2024 •

edited

Loading